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Abstract — The quantization of the output of a binary-input 
discrete memoryless channel to a smaller number of levels is 
considered. The optimal quantizer, in the sense of maximizing 
mutual information between the channel input and the quantizer 
output, may be found by an algorithm with complexity which is 
quadratic in the number of channel outputs. This is a concave 
optimization problem, and results from the field of concave 
optimization are invoked. The quantizer design algorithm is a 
realization of a dynamic program. 

Then, this algorithm is applied to the design of message-passing 
decoders for low-density parity-check codes, over arbitrary 
discrete memoryless channels. A general, systematic method to 
find message-passing decoding maps which maximize mutual 
information at each iteration is given. This may contrasted 
with existing quantized message-passing algorithms which are 
heuristically derived. The method finds message-passing decoding 
maps similar to those given by Richardson and Urbanke's 
Algorithm E. Using four bits per message, noise thresholds 
similar to belief-propagation decoding are obtained. 

Index Terms — discrete memoryless channel, channel quantiza- 
tion, mutual information maximization, LDPC decoding 



I. Introduction 

The problem of finding good channel quantizers is of impor- 
tance since most communications receivers convert physical- 
world analog values to discrete values. It is these discrete 
values that are used by subsequent filtering, detection and 
decoding algorithms. Since the complexity of circuits which 
implement such algorithms increases with the number of 
quantization levels, it is desirable to use as few levels as 
possible, for some specified error performance. 

Channel capacity is the maximization of mutual informa- 
tion, so a reasonable metric for designing channel quantizers 
is to similarly maximize mutual information between the 
channel input and the quantizer output. For a memoryless 
channel with a fixed input distribution, the quantizer which 
maximizes mutual information will give the highest achievable 
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Fig. 1: A discrete memoryless channel followed by a quantizer. Given P i tj and K, find 
Qk\i which maximizes /(X; Z). Refer to Sec. II-A for details on notation. 



communications rate. Previous work on quantization with a 
mutual information objective function has concentrated on 
continuous-to-discrete quantization with "locally optimal" al- 
gorithms, as will be explained in the following section. 

This paper gives two results for channels with discrete, 
rather than continuous, alphabets. The first result is a 
quadratic-complexity algorithm that finds the quantizer for 
a binary-input discrete memoryless channel (DMC) which 
globally maximizes the mutual information between the chan- 
nel input and the quantizer output. The second result is the 
application of this algorithm to find message-passing decoding 
mappings for low-density parity-check (LDPC) codes which 
maximize mutual information. Both results apply for arbitrary 
binary-input DMCs. 

The first result can be concisely stated as follows: consider 
a DMC with inputs X and outputs Y, as shown in Fig. 1. A 
quantizer Q maps / channel outputs to K quantizer outputs 
Z, with K < I for cases of interest. The set of all possible 
quantizers is denoted by Q. Under these conditions, the 
following Theorem for arbitrary DMCs holds: 

Theorem. The quantizer Q* which maximizes the mutual 
information between X and Z: 



Q* = argmax/(X; Z), 



(1) 



can be found with complexity proportional to I 2 , when X is 
binary. 

This maximization (1) is a concave optimization problem, 
and this paper invokes results from the field of concave 
optimization. Concave optimization is NP-hard in general, 
and a naive approach requires complexity exponential in I. 
The significance of the Theorem is that the optimal quantizer 
may be found with complexity quadratic in /. This problem 
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Fig. 2: Simplified illustration of information extrema problems, (a) Mutual information 
is convex in Qku- Rate-distortion computation finds the minimum. For the present 
problem, find the maximum, (b) Mutual information is concave in pj ; channel capacity 
computation finds the maximum. 



superficially resembles several familiar information-theoretic 
problems, such as determination of the rate-distortion function. 
However, as discussed in the following section, these problems 
are distinct because they are convex optimization problems. 

This optimization problem is formulated as a dynamic 
program, which results in a quantizer design algorithm, which 
is referred to as the Quantization Algorithm. While the al- 
gorithm assuming uniformly-distributed inputs appeared in 
conference proceedings previously [1], contributions of this 
paper include the proof of its optimality and an extension 
to non-uniform input distributions. A necessary condition for 
quantizer optimality, to be given as Lemma 3, is used to 
establish this result. 

The second result of this paper is to demonstrate the utility 
of the Theorem, by applying the Quantization Algorithm to 
the design of message-passing LDPC decoders with a fixed 
message alphabet size. Specifically, this paper describes a 
systematic method to design LDPC decoder message quan- 
tization and message-passing decoding maps for an arbitrary 
DMC and an arbitrary number of quantization levels. This 
method can be viewed from several perspectives: it produces 
non-uniform quantization of decoder messages, it produces 
message-passing decoding maps which maximize mutual in- 
formation, and it compresses the messages. 

The method applies the Quantization Algorithm at each 
step of density evolution; the obtained quantizers are then 
used to generate message-passing decoding maps. These maps 
may then be used to implement a finite-length decoder. They 
do not necessarily correspond to mathematical operations, 
but may be implemented by a look-up table. Numerical 
results show that using four bits per message gives error- 
rate performance similar to full belief-propagation decoding. 
In addition, the maps that this method automatically produces 
are similar to heuristically-derived decoding algorithms, such 
as Algorithm E [3]. 

The flow of the remainder of this paper is as follows. 
Related work is reviewed in Sec. II, and its relationship 
with this paper is stated. Sec. Ill (a) formalizes the problem 
statement as a concave programming problem (b) shows that 
the optimal quantizer is deterministic and (c) gives Lemma 3, 
a necessary condition on the optimality of the quantizer. Then, 
Sec. IV gives the Quantization Algorithm, which implements a 
dynamic programming approach to find the optimal quantizer. 

In Sec. V, the method for quantizing LDPC decoder mes- 
sages and finding the associated message-passing decoding 
maps is described. Then, numerical results are reported in 
Sec. VI, showing that noise thresholds for a quantized LDPC 



decoder can be quite close to those for floating-point decoding. 
Some discussion is given in Sec. VII. Proofs and additional 
lemmas are in the Appendix. 

II. Relationship with Prior Work 

This section establishes the relationship between the prob- 
lem studied in this paper and previous results. To do so, first 
notation is briefly given. Then, the relationship with well- 
known information extremum problems is described. After 
that, an overview of work on information-theoretic channel 
quantizer design is given. Finally, a sampling of related work 
on LDPC message quantization is given. 
A. Notation 

Here, the notation of Sec. I is formalized. The alphabet sizes 
of X, Y and Z are J, / and K, respectively. Define as follows: 

Pj = Pr(X = i), 

with j = 1,...,J, 
n = Pr(Y = t), 

with i = 1, . . . ,1, 

q k = Pr(Z = fc)> 

with k = 1, . . . , K , 

P i{j = Pr(Y = *|X = j), 
Q k]l = Pr(Z = fc|Y = i), and 

T k{j = Pr(Z = k\X = j) = J2Qk\iPi\j- 

i 

Except for Lemma 2 (given in the next section), this paper 
makes the restriction that J = 2. The sum Y^p et; c- means the 
sum over the whole alphabet ^ =1 , etc. 

The mutual information between two variables, say X and 
Z, is: 



r. 



J(X;Z) = EE^'^yTk 



(2) 



It is well-known that mutual information is convex (lower 
convex) in T k y, for fixed pj. Similarly, it is concave (upper 
convex) in pj for fixed Tj.u [4, Theorem 2.7.4]. 

The quantizer is given by Qk\i,k = 1,...,K and i — 
1, . . . , I and may be regarded as a K x I matrix. The set of 
possible quantizers Q consists of all K x I stochastic matrices, 
that is, with 



n = 



K 
j 71 "' I 5Z 71 '* = l,7Tl ~ }' 



then 



Q = U 1 . 



(3) 



(4) 



B. Superficially Related Problems 

Superficially, the information maximization problem, re- 
stated here for convenience, 



Q* = argmax/(X; Z), 



(5) 



appears similar to various information-theoretic optimization 
problems, particularly the computation of the rate-distortion 
function, 



R{D) 



min I(Y: Z) 
QeC 



(6) 



but it is distinct. Mutual information is convex in Q, and for 
the computation of R(D), mutual information is minimized, 
so this is a convex optimization problem [4, Sec. 13.7]. On the 
other hand, the channel quantization problem is to maximize 
mutual information in Q, leading to a considerably different 
concave optimization problem. Although X was replaced with 
Y to make the comparison, as discussed in the following 
section, this affine transform does not change the convexity of 
mutual information. Also, the distortion constraint was omitted 
for clarity. 

For the computation of the DMC capacity, mutual infor- 
mation is maximized over the input distribution pj, which is 
concave, while the channel transition probabilities are fixed. 
For problem (1), the maximization is in Q^u while the pj 
are fixed; this is also distinct. The relationship between the 
rate-distortion problem, the DMC capacity, and the problem 
treated in this paper is illustrated in Fig. 2. The well-known 
algorithm to compute the channel capacity and rate-distortion 
function, given by Arimoto [5] and Blahut [6], is a convex 
optimization method. 

Another information extremum problem is the information 
bottleneck method, from the field of machine learning [7]. 
The problem setup is identical, using the same Markov chain 
X — >• Y — >• Z. However, this is an information minimization 
problem, using a Lagrange multiplier to sweep a kind of rate- 
distortion curve. Moreover, it is a convex optimization method, 
using alternating minimization. 

C. Information-Theoretic Quantizer Design Criteria 

The channel cutoff rate, another information theoretic mea- 
sure, was suggested as a criterion for designing quantizers for 
continuous-output channels, by Wozencraft and Kennedy in 
the 1960s [8]; see also [9, Sec. 6.2]. Massey went on to empha- 
size that the cutoff rate was superior to the probability of error 
as a receiver design criterion, and gave an algorithm to find 
a channel quantizer which maximizes the cutoff rate for the 
binary-input AWGN channel [10]. Lee extended these results 
to channels with non-binary inputs, and gave an algorithm to 
find decision regions for continuous-output channels [11]. If 
the metrics for each output are restricted to integers, this can 
reduce the decoder complexity, and again can be designed 
using the cutoff rate [12], The channel cutoff rate has been 
used to quantize turbo decoders, as well [13]. 

While the cutoff rate is an important information theoretic 
measure, since the channel capacity, which is above the cutoff 
rate, can now be practically approached with LDPC codes, 
mutual information is a more appropriate measure. Also, 
the restriction to integer metrics is practical when decoding 
operations such as additions and multiplications are used; 
however, the LDPC decoder described in this paper does not 
use mathematical operations, instead decoding mappings could 
be implemented by lookup tables. 



More recently, maximization of mutual information has 
been considered as a criterion for channel quantizers. Most 
papers emphasize continuous-to-discrete quantization of the 
AWGN channel. The earliest work we are aware of is the 2002 
conference paper of Ma et al., which considered quantization 
of the binary-input AWGN channel [14]. For the special case 
of three quantizer outputs, it is straightforward to select a 
single parameter which maximizes mutual information. How- 
ever, for a larger number of outputs, local optimization is 
feasible and this has higher mutual information than uni- 
form quantization [15]. Singh et al. considered the problem 
of jointly finding capacity-achieving input distributions and 
AWGN channel quantizers [16]. Again, for an AWGN channel 
quantized to 3 levels, optimization over a single parameter was 
done, but for a larger number of outputs, a local optimization 
algorithm was used. In fact, this is a concave-convex problem, 
and global optimization appears difficult. 

While these locally optimal quantization algorithms appear 
to be effective, the subject of this paper is a globally optimal 
quantization. Also, whereas previous work concentrated on 
continuous output (and usually symmetrical) channels, the 
results here are obtained by a different approach: by working 
with quantized channels, rather than working directly with 
continuous output channels. Of course, a continuous output 
channel can be approximated with arbitrarily small discrep- 
ancy by an finely quantized channel. While previous papers 
concentrated on AWGN channels, the results in this paper 
hold for arbitrary DMCs. Thus, we believe that this is the first 
result on globally optimal quantization of general binary-input 
channels. 

In addition, as far as we are aware, all previous work a 
priori assumed deterministic quantizers were optimal. In this 
paper, it is shown that this assumption is valid, in Lemma 2. 

D. Design of Quantized LDPC Decoders 

The quantization of LDPC decoder messages is of great 
practical importance, and has received substantial attention in 
both the communication theory and VLSI literature. Belief- 
propagation decoding of LDPC codes uses real numbers for 
decoder messages, but most real-world implementations must 
quantize these real numbers. Accordingly, substantial attention 
has been directed at the design of quantized LDPC decoders. 
See [17] for a recent example. 

An efficient decoding approach is to use a bit-flipping 
algorithm, such as Gallager B for the binary symmetric 
channel, which uses only one bit per message. There are 
numerous variations on bit-flipping decoding for continuous- 
output channels [18]. But using one bit per message has a 
performance penalty. Algorithm E uses three-level decoder 
messages (0, 1 and erasure) to decode LDPC codes transmitted 
over the binary symmetric channel (BSC). The algorithm 
assigns an iteration-dependent weighting factor to the channel 
value, selected by maximizing mutual information [3]. Using 
two bits per message can further improve performance while 
being suitable for analysis and decoding on graphs with cycles 
[19]. Quantized LDPC belief-propagation decoders can be 
designed by considering mutual information, resulting in non- 
uniform message quantization, and it has been shown that 



using four bits per message is quite close to unquantized per- 
formance [20]. This is significant since conventional uniform 
quantization requires about six bits per message to achieve 
similar error performance. 

The LDPC decoding method described in this paper is in 
some sense a generalization of these previous works, where 
reasonable heuristics were used to specify the message-passing 
decoding rules, sometimes considering mutual information. 
But in this paper, no assumptions are made about the decoding 
rules. Instead, maximization of mutual information and the 
number of quantization levels are the only design criteria, 
and the specific message-passing decoding maps are found 
using the Quantization Algorithm, presented in Sec. IV. The 
method is similar to the one previously presented [2], although 
a greedy algorithm was used for quantization. The distinctions 
of this paper include the use of the efficient Quantization 
Algorithm and the handling of asymmetric channels. 

III. Concave Optimization and A Necessary 
Condition for Optimality 

A. Concave Optimization 

Concave optimization, also known as concave programming 
or concave minimization, is a class of mathematical program- 
ming problems which has the general form: 



min/(x), subject to x G S, 



(7) 



where S C M. n is a feasible region and /(x) is a concave 
function [21] [22]. There is substantial literature on concave 
optimization, and it is generally considered to be computation- 
ally more difficult than convex optimization. General concave 
optimization is NP-hard; more efficient methods may be found, 
but are problem dependent, as is the case in this paper. In 
concave optimization, there are multiple local minima, which 
is distinct from convex optimization. The following is an 
important result from the field of concave optimization: 

Lemma 1. [22, Theorem 1.19] A concave (convex) function 
f : S —} M. attains its global minimum (maximum) over S at 
an extreme point of S. 

The literature on concave optimization uses minimization 
of a concave function, but in the remainder of this paper, the 
equivalent maximization of a convex function is used. 

If S is a polytope, as in this paper, then the extreme points 
are its vertices. Lemma 1 can be visualized in one dimension 
in Fig. 2-(a), where it is clear that the maximum must be at 
either endpoint or 1, that is, the vertices of the line segment 
that is the feasible region. 

The objective of the maximization in (1) is given by: 

J2i> Qk\i'Pi>\j 



/(X;Z) = J2J2PjJ2Qk\iPi\ j log 



J2 j 'Pj'(J2eQk\ l 'P, 



'\3') 



affine transform of the original arguments [23, Sec. 3.2]. As 
stated earlier, mutual information is convex in Tuj, so, mutual 
information as expressed by (8) is convex in Q k u as well. 

Thus, the optimization problem given by (1), is maximiza- 
tion of a convex function, expressed as a concave programming 
problem over the / • K variables Qk\i- 

max 2^1^Pi\L Qw^i lo s v »,v o,,,p,i ; ' 

subject to: 

^2Qk\i = ^, i = ]-,■■■, I and 

k 



(9) 



Qk\i > 0, i = 1, . . . , / and k = 1, 



.K 



The constraint enforces that Qf-u is a conditional probability 
distribution. 



B. The Optimal Quantizer Is Deterministic 
This section demonstrates the following: 

Lemma 2. For any DMC and any K, the optimal quantizer 
Q* is deterministic. That is, Q* k u € {0, 1}, for all i and k. 

The feasible region in (9) is a polytope. According to 
Lemma 1, the optimal quantizer attains its global maximum at 
one of the polytope vertices. To prove Lemma 2, it is sufficient 
to show that the extreme points, or vertices, are at Qm = 
or 1. 

Consider the portion of the polytope for some fixed i. 
The vertices of this part of the feasible region are where 
the hyperplane J^k Qk\i = 1 intersects the K half-spaces 
Qk\i > 0. These K vertices are: 

{(1,0,. ..,0), 
(0,1,..., 0), 

(0,0,. ..,1)}. 

For each i = 1,...I, these K vertices may be selected 
independently and all vertices are at or 1. By Lemma 1, 
the solution is at one of the K 1 vertices, which completes the 
proof. Note that Lemma 2 holds for an arbitrary number of 
inputs J. 

As an example to illustrate Lemma 2, consider the binary, 
symmetric errors and erasure channel, with the transition 
matrix: 



P 



1 



P 

q 



q p 
p 1 



q 

p-q 



(10) 



(8) 



The relationship T k \j = ^TJi QkliPilj is an affine transform. 
If these conditional probability distributions are represented by 
matrices T, Q, P, then the affine transform is T = QP, where 
P transforms Q (in one vector space) to T (in another vector 
space). If a function is convex, then it is also convex in an 



for p, q > and p + q < 1. Suppose the three outputs, called 
0, erasure and 1, are to be quantized to two levels. One might 
expect that symmetry should be maintained by mapping the 
erasure symbol to the two output symbols with probability 
0.5 each. However, as Lemma 2 shows, this probabilistic 
assignment has lower mutual information than mapping the 
erasure symbol to either or 1 with probably one. This 
optimal quantizer lacks symmetry between the channel input 
and quantizer output. 



A naive optimization approach for (9) would be to search 
over all K 1 candidate solutions, which is searching over 
all deterministic quantizers. This has complexity which is 
exponential in I. However, as will be shown in the sequel, 
a more efficient algorithm exists. 



C. Necessary Condition for Optimality 

From Lemma 2, for each channel output i, there is exactly 
one value k', for which Qwu = 1, and for all other values of k, 
Qk\i = 0. For a given quantizer output k, let Ak be the set of 
values i for which Q^u = 1. The quantizer is a mapping from 
channel outputs to quantizer outputs. Under this mapping, Ak 
is the preimage of k. The sets A m and A n are disjoint for 
m 7^ n, and the union of all the sets is {1, 2, . . . , /}, 

Lemma 3 describes a necessary condition for the quantizer 
to be optimal. It is key for proving that the Quantization Al- 
gorithm produces the optimal quantizer. The proof of Lemma 
3 is given in the Appendix. 

Lemma 3. For the quantizer which maximizes mutual in- 
formation, the preimage Ak consists of consecutive channel 
outputs, for each k = 1, . . . , K, when the channel outputs are 
sorted according to: 
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/|2 



Note that there is no loss of generality because the outputs 
can always be re-labeled such that (11) holds. In addition, 
the log-likelihood ratio for channel output i is logPj|i/Pj|2. 
Since the log function is mo no tonic, the sorting condition is 
equivalent to sorting the log-likelihood ratios. 

Strict inequalities are used in (11) because if 



P 



|2 



P 



t+l|l 



P 



+ 1|2 



(12) 



then outputs i and i+1 can be combined to a single output with 
the likelihood Pju + Pi+i\j for input j to form a new channel 
with 7—1 outputs. The likelihood ratio for the combined 
output, 



P»|i + Pi+i|i 



P 



i\1 



P 



(13) 



t+l|2 



is equal to (12). In addition, the original channel and the new 
channel have the same mutual information. The use of strict 
inequalities simplifies the proofs in the Appendix. 

Remark 1. One might expect that the sorting condition 
(11) should depend upon the input distribution pj, but it is 
independent. In fact, the sorting condition can be rewritten as: 



Pi -Pill P1P211 

< — ^- < 



VlP\\2 P2p 



2|2 



PlP/-l|l PlP/ll 

< - < — . 

/|2 



P2Pl-l\2 V2P1 



(14) 



which is equivalent to (11). Thus, the condition introduced 
for uniformly-distributed inputs previously [1], is valid for 
arbitrary input distributions as well. 



Thus, interest may be restricted to the cases where, A\ is 
the set {1,2,..., a{\, and Ai is the set {a\ + 1, . . . , a 2 }, etc., 
and Ak is the set {or-_i +!,...,/}, with, 



1 < a\ < a 2 < ■ ■ ■ < dK-i < I- 



(15) 



Each Ak has at least one element. For convenience, let ao = 
and cik = I- While there are K 1 possible deterministic 
quantizers, the optimal quantizer is contained in a subset, and 
the Quantization Algorithm searches this subset. 

IV. Quantization Algorithm 

Lemma 3 states that a necessary condition for a quan- 
tizer optimality is that the preimage of a quantizer output 
are consecutive channel outputs. For a given channel, find- 
ing the optimal quantizer reduces to finding the boundaries 
a\, a'2, ■ ■ ■ , a K _ 1 which maximize the mutual information. 
This section describes an algorithm which finds those bound- 
aries. First, partial mutual information is described. 
A. Partial Mutual Information 

A partial sum of mutual information, called "partial mutual 
information," is used both in the Quantization Algorithm and 
in the proofs in the Appendix. Partial mutual information 1 is 
the contribution that one or more quantizer outputs makes to 
the total mutual information. For a deterministic quantizer, the 
total mutual information is: 

/(X;Z) = EEftE^log ^ P %.,, ■ 

since Qk\i = 1 if and only if i <G Ak- 

Under the quantization mapping from channel outputs to 
quantizer outputs, the preimage of quantizer output m is A m - 
The partial mutual information i m for this output is: 



2_ws 



P'U 



j ieA m l^j' t>3' l^i'eA m r ^\3' 



,(16) 



so total mutual information is the sum of all the partial mutual 
information terms: 



J(X;Z) = Y, 



>k- 



(17) 



Further, let consecutive channel outputs a' + 1 to a, with 
a' < a < I, be assigned to a single quantizer output. Denote 
by i{a' — > a), the partial mutual information: 

L(a^a) = Y p i 1^ filing v n V a WT' 

j i=a'+l l^j'Pj' 2^i'=a' + l r i'\i' 

(18) 

So if Ak = {ofe_i + 1, . . . , a k }, then i k = t(ofc_i -> a k ). 



B. Quantization Algorithm 

The Quantization Algorithm is a quantizer design algorithm 
and is the realization of a dynamic program. The algorithm 
has a state value Sk(i), which is the maximum partial mutual 
information when channel outputs 1 to i are quantized to 



5*o(0) 



a 2 =0.6 




Fig. 3: Trellis-type illustration showing the relationship between state metrics for 1 — 5 
and K = 3. 



quantizer outputs 1 to k. This can be computed recursively 
by conditioning on the state value at time index k — 1: 



S k (a) 



(S k -i(a) + b{a -* a)), 



(19) 



where the maximization is taken over a' E {k — 1, . . . , a— 1}. 
Clearly, Sk(I) is the maximum total mutual information. The 
path 6*0(0), Si(ai), . . • , Sk{o,k) which gives the maximum of 
total mutual information corresponds to the optimum quantizer 
whose boundaries are {ai, 02, ... , a^}. The relationship be- 
tween the states metrics are illustrated in a trellis-type diagram 
in Fig. 3, for 1 = 5 and K = 3, 

The Quantization Algorithm follows. It will be convenient 
to denote this algorithm as Q* = Quant (Pi\j, K), where Q* 
is a K x I matrix. 

Quantization Algorithm 

1) Inputs: 

• Binary -input discrete memoryless channel P t y. If 
necessary, modify labels to satisfy (11). 

• The number of quantizer outputs K. 

2) Initialize 5 (0) = 0. 

3) Precompute partial mutual information. For each a' E 
{0, 1, . .. ,1—1} and for each a E {a' + l, . . . , t} (where 
t = mm(a' + 1 + I-KJ)): 

• compute l{o! — > a) according to (18). 

4) Recursion. For each k E {1, . . . , K}, and for each a E 
{k,...,k + I-K}, 

• compute Sk(a) according to (19), 

• store the local decision hk{a)\ 

hk{p) — argmaxS'fc_i(a') + t(a' — > a), 

a' 

where the maximization is taken over a' E {k — 
l,...,o-l}. 

5) Find the optimal quantizer by traceback. Let a* K = I. 
For each k E {K - 1, K - 2, . . . , 1}: 

a k = ^fe+i( a fe+i)- 




Fig. 4: Optimal quantization to K — 8 levels of a DMC derived from a finely- 
quantized binary-input AWGN channel, with noise variance a . Solid and dotted lines 
show quantization boundaries when the AWGN channel is quantized to / — 500 and 
7 — 30 levels, respectively. 



6) Outputs: 

• The optimal a*, oj, • • • , a *K-i- Equivalently, output 
the matrix Q* , where row k of Q* has ones in 
columns a,k-i I 1 to at and zeros in all other 
columns. 

• The maximum mutual information, Sk(I)- 

There may be multiple optimal quantizers. In an implemen- 
tation, storage of the local decision and traceback should deal 
with ties. This was not explicitly indicated, to keep the notation 
simple. 

The main computational burden is to pre-compute i(a' — > a) 
in step 3. Since a' is from a set of size / and a is from a set 
of size at most I — K + 1, the number of i computations is 
proportional to I 2 . Note that in (18), the sum on i could be 
over as many as I — K + l terms. However, since this sum can 
be computed recursively, the complexity remains proportional 
to/ 2 . 

Also, for each k in step 4, roughly \{I — K) 2 add/compare 
operations are needed, and there are K such steps. This results 
in number of operations roughly \K(I — K) 2 . This is also 
proportional to I 2 . Note that if / is not much larger than 
K, then (I — K) is close to zero. Thus, the computational 
complexity is quadratic in /. 

This complexity result, along with the proof of optimality 
in the Appendix, proves the Theorem. 

Source code which implements the Quantization Algorithm, 
as well as the density evolution method in the next section, is 
available [24]. 



C. Finely Quantized Continuous-Output Channel 

While the Quantization Algorithm can only be applied 
to discrete channels, it can be used to obtain good coarse 
quantization of a continuous-output channel, by first using 
fine quantization. This can be illustrated for the binary-input 
AWGN channel with ±1 inputs and Gaussian noise variance 
a 1 . Fig. 4 was created by first uniformly quantizing the 
channel between —2 and +2 with I = 500 or / = 30 steps. 
Then, the Quantization Algorithm is applied with K = 8 
quantizer outputs. The figure shows the K— 1 = 7 quantization 
boundaries when the AWGN channel was finely quantized 
with I — 500 (solid line) and / = 30 (dashed line). 

V. Application to Decoding LDPC Codes 

A. Overview 

In this section, the Quantization Algorithm is used to find 
decoding mappings for LDPC decoders. Alternatively, this 
may be viewed as finding optimum non-uniform quantization, 
or as efficient message compression. 

No assumptions are made about the message-passing maps. 
Instead, maps are derived from the quantizer generated by the 
algorithm. As a result, the decoding maps maximize mutual 
information. The "channel" being quantized is not a model of a 
physical communication system, but a conditional probability 
density on the LDPC decoder messages. In particular, the 
LDPC code bits will play the role of X and the decoder 
messages will play the role of Z. A cross-product distribution 
will play the role Y. 

The maps are generated in the context of density evolution 
[3], and an overview of this method is shown in Fig. 5. In 
density evolution, at each degree-d node and each iteration, 
a given conditional input distribution is used to determine 
an output distribution. Each of d — 1 incoming messages 
is a discrete distribution with K values. An intermediate 
(cross-product) message distribution with K d ~ x values is 
created, and is simply the cross-product of the incoming 
message distributions. The key step comes when we apply 
the Quantization Algorithm to this cross-product distribution, 
which produces a quantizer that reduces the distribution to 
K values. This quantizer is then used to find the message- 
passing decoding map. These decoding mappings are locally 
optimal, in the sense of maximizing mutual information at 
each decoding iteration; we cannot say anything about global 
optimality over all iterations. Also, by using density evolution, 
finite-length effects can be ignored, and only the channel, the 
LDPC node degrees and the value K are required as inputs to 
this procedure. 

Classical density evolution is restricted to channels with cer- 
tain symmetry properties [3]. But here, arbitrary and asymmet- 
rical channels are allowed, and the optimized decoding maps, 
and thus the distributions, may be asymmetrical even if the 
channel was symmetrical. Fortunately, Wang et al. generalized 
density evolution to asymmetric channels [25]. They showed 
that while error rates are codeword-dependent, it is sufficient 
to consider the evolution of densities only for the two code 
bits, that is densities conditioned on X = and X = 1. The 
same method will be used here. 
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B. Message-Passing Decoding of LDPC Codes 

An arbitrary, binary-input DMC is used for transmission; 
it has binary inputs X and K c ^ outputs W e W = 
{1,2, . . . ,K c h}. The channel transition probabilities are de- 
noted by r(°); 

r W (x ,y ) = Pr(\N = y Q \X = x ). (20) 

In message-passing decoding [3], the variable-to-check mes- 
sages R are from the message alphabet 1Z. Similarly, the check- 
to-variable messages L are from the message alphabet C. The 
sets 1Z and C are discrete with \R\ < K and \C\ < K. 

At iteration £, the check node with degree d c finds an 
outgoing message using d c — 1 incoming messages, by a 
mapping function: 

x (i) . n <k-i _>. c (21) 

Similarly, at iteration £, the variable node with degree d v finds 
an outgoing message using the channel value and d v — 1 
incoming messages, by a mapping function: 

$ w : W x £ d " _1 



K. 



(22) 



C. Density Evolution 

The object of interest is the density of the messages R and 
L. Because of possible asymmetries, density evolution tracks 
the probabilities conditioned on both X = and X = 1. On 
iteration £, the probability distribution for R is: 

r {t \x,y) = Pr(R = y\X = x), 



(23) 



with y E 1Z, and the probability distribution for L is: 
l^{x,y) = Pr(L = y\X = x), 



(24) 



with y <G C. 

The method described here finds the message-passing de- 
coding maps x an d &, as we ll as tne probability distributions r 
and L In particular, for each iteration and each node type, there 
are four steps: (a) given the node input distribution, a cross- 
product distribution is found; (b) the Quantization Algorithm 



produces a quantizer to K levels; (c) the reduced distribution 
is found, which is used in the next step of density evolution; 
(d) the decoding mapping is found for each quantizer. 

Notation is given first. Two functions / c and / v are of 
interest when decoding LDPC codes. At the check node: 

fc{xi,...,Xd c -i) = Xi-\ hid c -i mod 2 (25) 

and at the variable node: 



/ v (a;o, ...jX^-i) 



if xq = x\ = ■ ■ ■ = 

1 if xo = xi = • • • = 1 
otherwise undefined 



(26) 



where Xi are binary values. It is convenient to use a single 
symbol that is a concatenation of the component messages 
in the cross-product distribution. In the context of the check 
node, let y' denote the concatenation: 



V 



(yi,V2,---,yd c -i), 



(27) 



where y' £ lZ dc 1 . And in the context of the variable node, 
let y' denote the concatenation: 



y 



(j/o, 2/i, 2/2, •• -,2/^-1), 



(28) 



where y' e W x C^' 1 . 

Step (a) is to find the cross-product distributions l^\x,y') 
and r^(x,y'), given by: 



l {l) (x,y>) = 



d c -2 



d c -l 



J2 n r(<_1) (*i.w)J[29) 



x:/ c (x)=x i=l 

where x = (x\,X2, ■ ■ ■ ,Xd c -i), and 

x:/ v (x)=x i=l 

(30) 

where x= (x ,Xi, . . . ,X dv -i). 

Then, in step (b), the cross-product distribution is reduced 

to K levels using the Quantization Algorithm. The matrix- 
es) (£) 
form quantizers Q c and Q y are produced at each iteration 



£, given by, 



)(/) 
)(•«) 



Quant (l {i \ K) and (31) 

Quant (r^, if). (32) 



Since LDPC codes are linear codes, code bits and 1 occur 
with equal probability. Thus the input distribution pi = P2 = 
0.5 is used by the Quantization Algorithm. 
Step (c) is to find the reduced distributions as: 

lW(x,y) = Y, ^W)and (33) 

l/':Qc(l/iW') =1 

rW(x,y) = J2 ^M), (34) 

J/':Qv(j/,J/') = 1 

where Q(y,y') is the element in row y and column y' of the 
matrix Q. 



Step (d) is to find the decoding maps, which are given by: 

X (£) (2/')=2/ if Q { c e) (y,y') = l, and (35) 

& i \y') = y if Q[ t Hy,y') = l. (36) 

Since each column of Qc and Q\, has a single 1, the 
functions X (y') and $w(y') are defined for all values of 
y' . This step is not needed for density evolution. 
More precisely, density evolution is as follows: 

1) Initialize with £ — and the channel message given by 
(20). 

2) Check node: Compute (29), followed by (31), followed 
by (33). 

3) Variable node: Compute (30), followed by (32), followed 
by (34). 

4) If the mutual information I(X; R) approaches 1, then 
declare convergence. If a fixed number of iterations is 
exceeded, declare non-convergence. Otherwise, incre- 
ment £ and iterate from the check node step. 

At steps 2 and 3 above, decoding mappings can also be 
obtained. 

Density evolution steps (a), (b) and (c) may be schematically 
represented as the following flow: 



r (0) > i(- 

Q< 2 > 



IS 1 



"(I) 



,.(1) 



/( 2 



r>< 2 ) 
1(2) y f(2) <^ r (2) 



Also, step (d) finding the decoding mapping can be schemat- 
ically represented as: 



Qi 2) 

Qi 2) 

0(3) 



x (i) 
$(i) 

x (2) 

$( 2 ) 
x (3) 



Decoding can be implemented as follows: on the first iteration, 
the channel messages W are sent to the check node which uses 
the mapping x^ t0 generate check-to-variable messages L. 
Using messages L and W, the variable node uses the mapping 
& 1 ' to generate variable-to-check messages R. This continues 
iteratively, until a stopping condition, such as convergence to 
a codeword or maximum number of iterations, is satisfied. 

Note that the Quantization Algorithm may also be used to 
design a mapping function which makes a hard decision on 
x. For each iteration, repeat the variable node steps (a) to (c), 
using all d y inputs and quantize to K = 2 levels to make hard 
decisions. 

The complexity of this density evolution method is dom- 
inated by the Quantization Algorithm at step (b). The com- 
plexity is proportional to KI 2 . For density evolution, I = K d 
for a degree d variable node and / = K d ~ x for a degree d 
check node. Thus, we have complexity proportional to roughly 
j2+i/d^ w hi c h is slightly worse than quadratic. 
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Fig. 6: Noise thresholds for various decoder message quantization and AWGN channel quantization. Amount of message quantization is K . Code is a rate 1/2 (d c — 3, d v — 6) 
LDPC code. 



VI. Numerical Results for LDPC Codes 

Using the method of the previous section, this section gives 
noise thresholds for LDPC decoders with quantized messages, 
on quantized AWGN channels. Also, examples of decoding 
mappings \ and $ are given. 
A. Noise Thresholds for AWGN Channel 

Because the complexity of the method given in the previous 
section is comparatively low, it can be effectively used as a 
way to compare quantization various schemes. 

For the numerical evaluation, a binary-input AWGN channel 
was quantized to K c ^ levels. The specific quantization levels 
were found by finely and uniformly quantizing the AWGN 
channel, then applying the Quantization Algorithm. Then, the 
message-passing decoder was restricted to using K levels 
per message (log 2 K bits per message). For variable nodes, 
I = K d *~ x ■ K c h input messages are quantized into K output 
messages. For check nodes, / = K d <-~ 1 input messages are 
quantized into K output messages. 

The results for a d c = 6, d v = 3, rate 1/2 regular LDPC 
code are given in Fig. 6, which shows noise thresholds 
versus the degree of channel quantization, for various amounts 
of message quantization; also shown are unquantized noise 
thresholds. The most significant observation is that noise 
thresholds at K = 16 levels per message (4 bits per message) 
are difficult to distinguish from those for which the messages 
are not quantized. Note also that coarsely quantized messages 
(K = 3) with a finely quantized channel (K c h = 12 to 32) has 



better noise thresholds than finely quantized messages with a 
coarsely quantized channel (K^ — 2,3). This is interesting 
from a practical point of view, since decoders with coarsely 
quantized messages have more efficient implementation. 

B. Finite-Length LDPC Codes 

Fig. 7 shows the bit error-rate for a rate 1/4, d v — 3 and 
d c = 4 LDPC code of block length 1000, constructed using 
Gallager's method [26], on the binary symmetric channel. 
Again, with K = 16 (4 bits per message), the decoder error 
rate is similar to belief-propagation decoding using a floating 
point implementation, for smaller BSC crossover probabili- 
ties. The decoding maps were generated using a single BSC 
crossover probability near the noise threshold. But over all the 
simulated channels, a single sequence of decoding maps was 
used. Note that error floors, a common problem with quantized 
decoders, do not appear above probability of error of 10~ 8 . 

C. Algorithm E-like Decoding Mappings 

As shown in this subsection, the mappings derived by the 
proposed method are similar to those of the heuristically- 
derived Algorithm E [3]. 

Algorithm E corresponds to the case of K c h = 2 (binary 
symmetric channel, BSC) and K = 3. Algorithm E is 
a message-passing LDPC decoder for the BSC where the 
decoder messages set has three messages: 0, 1, and an erasure 
message denoted by e . The variable node mapping performs 
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TABLE I: Decoding mappings \ an d <?> produced by the described method, for K ■■ 
and -K" c h — 2 (binary symmetric channel), similar conditions to Algorithm E. 



(a) Check node mapping \i 



(b) Check node mapping \2 



0.04 0.06 0.08 0.1 0.12 0.14 0.16 
BSC error probability 



Fig. 7: Finite-length simulation on binary symmetric channel. Rate 1/4, block length 
1000 (3,6) regular LDPC code. 



a "majority vote", using the incoming or 1 messages, but 
the channel message votes with weight w^ £ {1,2,3,...} 
on iteration I; an incoming message e does not vote. If there 
is a tie, then the output is e. The weight w™' is chosen to 
maximize mutual information. For the check node mapping, 
if any incoming messages are e, then the output message is 
e; otherwise the output is the modulo 2 sum of the inputs. 

For (L, = 3 codes with d c = 4 and d c — 6, the mapping 
functions produced by the proposed method are shown in Table 
I. The per-iteration mapping functions are shown in Table 
II; the entries in Table II refer to mappings in Table I. The 
message sets are L — 1Z — {0, e, 1}. In Table I, (#0, #e, #1) 
denotes the number of each type of message at the input. For 
d c = 4, the BSC crossover probability is 0.14, and for d c = 6, 
the BSC crossover probability is 0.06. 

The check node mappings, in Table I-(a) and Table I-(b), 
produced by this method are identical to those for Algorithm 
E. The variable node mapping in Table I-(c) corresponds to 
the Algorithm E with weight w = 2 with two symmetrical 
exceptions ((0,1,1) with W = and (1,1,0) with W = 1 
would map to a with Algorithm E, not e). The variable 
node mapping in Table I-(d) is the same as Algorithm E with 
weight w = 1. 

The noise threshold reported for Algorithm E with d c = 3 
d v = 6 is 0.07 (0.084 for belief -propagation decoding) [3]. For 
the proposed method, the noise threshold is 0.0708. Note that 
the BSC crossover probability used to find the mappings in 
Table I and Table II are slightly lower. In fact, not all choices 
of the BSC crossover probability produced symmetrical tables. 

VII. Discussion 

This paper showed that finding optimal quantizers for dis- 
crete memoryless channels is a concave optimization problem. 
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(c) Variable node mapping <E>i 
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(d) Variable node mapping $2 
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TABLE II: Per-iteration decoding tables for the BSC with K — 3 levels per message, 
for d c — 4 and d c — 6, tables shown in Table I. 



(a) d c = 4 and BSC with crossover probability 0.14. 
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For concave optimization problems in general, finding efficient 
algorithms is highly problem-dependent. Indeed, for this spe- 
cific concave optimization problem, the structure of the mutual 
information objective function was carefully considered to ar- 
rive at an efficient optimization method. Concave optimization 
results were invoked in the proof of Lemma 2 and related 
convexity properties are used in the proof of Lemma 4, in the 
Appendix. 

This method applies to arbitrary discrete memoryless chan- 
nels. Many previous papers concentrated on specific channels, 
particularly continuous-to-discrete quantization of AWGN 
channels, and moreover optimal solutions were not obtained. 
While some DMCs can be obtained by quantizing AWGN 
channels, arbitrary DMCs which cannot be obtained this 
way. Thus, our results are in one sense, more general. In 
addition, the DMC quantization algorithm may be applied to 
continuous -output channels, such as the binary-input AWGN 
channel, by first finely quantizing the channel, then applying 
the Quantization Algorithm to the resulting DMC. 



The optimal quantizer is deterministic, rather than stochas- 
tic. As far as we are aware, all previous work assumed 
deterministic quantizers. From a mathematical viewpoint, there 
is no reason to a priori assume that deterministic quantizers 
are optimal, and disallow stochastic quantizers; in the rate- 
distortion problem, the optimal quantizer is usually stochastic. 
From an engineering viewpoint, deterministic quantizers are 
far more practical. Lemma 2 shows that this engineering 
preference is in fact optimal. 

This paper's results are mostly restricted to DMCs with 
binary inputs. The Quantization Algorithm assumed that the 
channel outputs can be sorted; but for a channel with three or 
more inputs, there is no clear sorting of the channel outputs. 
In fact, the motivation for this paper was optimal quantization 
of state metrics for the BCJR algorithm (with more than 
two states) and of non-binary LDPC codes; see earlier work 
[27] and [28]. These problems could be solved using the 
quantization of a discrete memoryless channel with non-binary 
inputs, if such a method existed. Future extensions to non- 
binary inputs could use techniques similar to those developed 
here, so this paper forms a basis for solving the non-binary 
input optimization problem. 

Throughout this paper a fixed input distribution was used, 
which corresponds to finding an achievable rate, for a given 
channel and a given number of quantization levels. However, 
computing the capacity of a channel subject to a quantization 
restriction is of considerable interest. This would require find- 
ing an input distribution and quantizer which jointly maximize 
mutual information. This paper provides a foundation for 
addressing this problem. 

The other major result in this paper was to show that the 
Quantization Algorithm can be applied to implementation of 
LDPC decoders. While it has been known for some time that 
using non-uniform quantizers that change from iteration to 
iteration can improve performance, the selection of quantizers 
is ad hoc. A contribution of this paper is to give a systematic 
approach for generating the quantizers. Indeed, this approach 
generates quantizers which are non-uniform and change as the 
iterations progress. 

This method selects quantizers and message-passing de- 
coding maps which maximize mutual information at each 
decoding step. Clearly, it would be more desirable to find 
decoding maps which maximize mutual information over all 
iterations. However, as we have shown, this per-iteration 
optimization has good performance, and yields some insight 
into efficient quantization methods. 
Appendix 

After giving some notation and terminology, this Appendix 
states and proves Lemma 4. Then, Lemma 5 is stated and 
proved using Lemma 4. Next, Lemma 3 from Sec. Ill is proved 
using Lemma 5. Finally, the Theorem is proved, using Lemma 
3. 

The theme of Lemmas 3, 4 and 5 is that for any quantizer 
where the preimage of a quantizer output does not consist 
of consecutive channel outputs, then there exists another 
quantizer with higher partial mutual information. Lemma 4 
applies for the case of alternately assigning even and odd 



numbered channel outputs to two quantizer outputs. Lemma 5 
applies to the case of any assignment to two quantizer outputs. 
Finally Lemma 3 applies to an arbitrary number of channel 
outputs. 

A. Preliminaries 

Consider some subset of the channel outputs in which all 
subset members are assigned to one of two quantizer outputs; 
the size of the subset is /', with /' < /. Let c, = p\Pi\\ 
and bi — p2Pi\2 f° r i = 1, 2, . . . , /', The channel outputs are 
indexed and sorted: 



< 



('2 



< 



< 



Cl> 



(37) 



Cl 

h " b 2 " bit 

which corresponds to the condition (11) for the indices 
{1,2,...,/'} (see also Remark 1). Let C = Yn=i °i ^ 1 
and B = 5Z i=1 bi < 1, For fixed values of C and B, let Tc,b 
be given by: 

T CtB = {(c,b) | 0<c<C,0<6<B}, (38) 

so that (cj, bi) € Tc.b for i = 1,2,...,/'. 

Remark 2. Note that the indexing i = 1,2,..., /' is for 
notational convenience only. In what follows, we could take 
any subset of size /' of the / channel outputs. Clearly if (11) 
holds for / channel outputs, then (37) will also hold. 

Since this part of the Appendix is concerned with mapping 
only a subset of channel outputs to two quantizer outputs, we 
are interested in two assignments A\ and A<2 such that A\ U 
A2 = {1, 2, ...,/'} (as with Remark 2, the subscripts 1 and 2 
should be regarded as a convenience). The terms "assignment" 
and "preimage" are used interchangeably. 

For these two quantizer outputs, define a partial mutual 
information function l(c, b) for (c, b) <G Tc.b as follows: 



t(c,b) 



log 



c/pi 



c + b 
-(C-c) log 

-(B-b) log 



b log 



b/p 2 



(C 



-b 



(C - c) + (B - b) 

(B - b)/ P 2 



(C-c) + (B-b) 
Note that (39) can also be expressed as 

b 



(39) 



i(c, b) 



log- 



+ b log ■ 



■(C-c) log 
■(B-b) log 



c + 
(C- 



(C-c) + (B- b) 
(B-b) 



(C-c) + (B- b) 
-Clogpi - B\ogp 2 , 



(40) 



where the last two terms on the right-hand side do not depend 
on the values c and b. 

The partial mutual information function l(c, b) expresses the 
sum of partial mutual information of two quantizer outputs. By 
letting c = J2ieA Ci anc ' & = ^lieA ^ f° r an assignment Ai 
and A2, then it is readily seen that 



i(c,b) = t\ +42, 



(41) 
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where i m with m = 1, 2 is defined in (16). 

If V = 5, Ax = {1, 2} and _4 2 = {3, 4, 5}, then the partial 
mutual information of this assignment is i(c\ + Ci, 6i + 62)- 

For fixed C and B with < C < 1 and < B < 1, 
the partial mutual information function has the following 
properties: 
(i) For any values (c, b) E Tc,b'- 



t(c,b) = i{C-c,B-b). 



(42) 



That is, the function t(c, b) is symmetric with respect to 
the point (§,§). 
(ii) For fixed pj, the function t(c, b) is convex in (c, 6), that 
is, for any distinct (c',b') G J-c,b and (c",b") € J~c,b, 

t(c , b ) < 9t(c', b') + (1 - f9)i(c", 6") for < 6 < 1, 

(43) 
where c = 0c' + (1 - 0)c" and 6 = 06' + (1 - 0)b" , 
with equality if and only if 

6' ~ 6" ~ S' 
Inequality (43) and equation (44) can be shown by the 
log-sum inequality [4, Theorem 2.7.1]. It follows from 
the equality condition that the partial mutual information 
is strictly convex over any sub-region T' C Tc b which 
does not include a segment of the line ? = §, that is, 

<= b B' ' 

the interior of J 7 ' n {(c, b)\ f = 5} is empty, 
(iii) For any values (c, b) <E Tc,b'- 



(44) 



i(c,b) > Clo, 



C 



Slog- 



B 



C+B a C+B 

-C\og Pl -B\ogp 2l (45) 

which holds with equality if and only if % = -g-. 
Equation (45) can be easily shown by applying the log- 
sum inequality to the first and the third terms and the 
second and the forth terms on the right-hand side of (40), 
respectively. 
Define the following points, 



= (ci,bi), for i = !,...,!', 



(46) 



so that qi <G Fc,b- These points and the region Tc,b are 
illustrated in Fig. 8 and Fig. 9, for I' = 4. If the quantizer 
consists of Ax — {i} and Ai = {1,2,3, 4} \i, then the partial 
mutual information for this assignment can be seen in the 
graph as the value of the level curves at qi — (ci,bi). Similarly, 
if Ai = {1,3} and Ai — {2,4}, then the partial mutual 
information can be seen as the value of the level curves at 
9i3 = (ci +c 3 ,6i +63). 

Fig. 8 and Fig. 9 illustrate examples for two important cases. 
Fig. 8 shows the case of bl + &3 > 5a an( j pja. 9 shows the case 



C1+C3 



of 0l +° 3 < 52. The sorting assumption (37) implies that the 
slope of the line passing through (ci,bi) and (0,0) decreases 
as i increases. The diagonal line corresponds to the points 
satisfying % = ^, and partial mutual information on this line 
is the minimum in J-c,B, by property (iii). 

Now, define the following terminology. An "odd-even as- 
signment" is an assignment where the I' channel outputs 



indexed {1, 2, 3, ... , /'} are alternately assigned to two quan- 
tizer outputs 1 and 2 with sets A\ and A2. The odd-even 
assignment is A\ = {1,3,..., I'} and Ai — {2, 4, ...,/' — 
1} (for V odd), or Ax = {1,3,...,/ - 1'} and A 2 = 
{2,4, ...,/'} (for/' even). 

In an arbitrary mapping to two quantizer outputs, a consecu- 
tive sequence of channel outputs may be mapped to the same 
quantizer output. Any maximal such set is called a "output 
group." Denote by H the number of such groups. For example, 
if eight channel outputs are assigned as 



Ax = {1, 2, 3, 6, 7} and A 2 = {4, 5, 8}, 



(47) 



then H = 4. The four output groups are (1, 2, 3), (4, 5), (6, 7), 
and (8). The minimum value of H is two. For the odd-even 
assignment, H = I'. 

An assignment Ax and Ai is called "consecutive" if 
max(^li) < min (^2)- Otherwise, the assignment is called 
"non-consecutive." An assignment is consecutive if and only 
if the number of output groups H = 2. For example, the 
assignment Ax — {1,3} and Ai = {4,5,7} is consecutive. 
On the other hand, Ax = {1, 4} and Ai = {3, 5, 7} is a non- 
consecutive assignment. 

B. Lemma 4 

The following lemma states that reducing the number of 
output groups will increase partial mutual information. 

Lemma 4. For the odd-even assignment with I' > 3 channel 
outputs satisfying (37) and H' = I' output groups, let the 
partial mutual information be denoted by l' . Then, there exists 
another assignment with H" output groups and partial mutual 
information l", with H" < H' and 1" > l' . 

To illustrate Lemma 4, consider I' = 3. There are three 
non-trivial ways to assign the three channel outputs to two 
quantizer outputs: 

• Quantizer 1: combine outputs 2 and 3 (H = 2 output 
groups), 

• Quantizer 2: combine outputs 1 and 3 (odd-even; H = 3 
output groups), 

• Quantizer 3: combine outputs 1 and 2 (H = 2 output 
groups). 

Partial mutual information for each quantizer is t(cx,bx), 
t-{ci,bi) and 1(03,63), respectively, where property (i) was 
used for Quantizer 2. From Lemma 4, Quantizer 1 or Quantizer 
3 always has greater partial mutual information than Quantizer 
2. That is, 



c(ci,bi) < max(t(ci,6i),t(c3,6 3 )). 



(48) 



Quantizer 1 and Quantizer 3 each have two output groups, 
which is strictly fewer than the three output groups of Quan- 
tizer 2. 

Intuitively, it is expected that the partial mutual information 
for some channel with H' = 4 output groups should be lower 
than the same channel with H' = 3 output groups. But for 
mathematical completeness, Lemma 4 is proved for arbitrary 
H' . The proof for I' = 3 has some simplifications. So it will 
be proved for /' = 4 first, then extended to higher values. 
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Fie. 8: Partial mutual information t(c. b) is shown usins level curves. Case of — —, — - > 



Proof. Consider I' = 4 and H' = 4. The odd-even assign- 
ment has partial mutual information i 13 = l(ci + c 3 , b\ + 63). 
Consider cases of three other assignments: 

. Quantizer 1: Ax = {1, 2,3} and A 2 = {4}, has H" = 2, 
with partial mutual information (423 = 1(01+02+03, &i + 
&2 + 6 3 ), 
. Quantizer 2: A\ = {1} and _4 2 = {2, 3, 4}, has H" = 2, 

with partial mutual information l\ = (,(01,61), and 
. Quantizer 3: A x = {1, 2} and _4 2 = {3, 4}, has H" = 2, 
with partial mutual information ti 2 = t(ci + c 2 , 61 + 62). 
It will be shown that the partial mutual information result- 
ing from one of these three assignments has greater mutual 
information than the odd-even assignment, that is: 

l 13 < max(ti,/,i2 3 ,ti2)- (49) 

Since for each these three cases, H" < H', then the Lemma 
will hold for I' = 4. 

Define the points <ji 3 , 9123 G Fc,B as: 

03,61+^3) and (50) 

02 + 03,61+62 + 63)- (51) 



913 = (Cl 
9123 = (Cl 

Define the following rays: 

Ro = {0<?i 3 |0>l}, 

Ri = Ui3 + d{qi23 -qi3)\0 >0} and 
R% = {913 + ^(?123- 913) \o<o}- 



(52) 
(53) 
(54) 



The union of R\ and R2 is a line, which has a minimum in 
partial mutual information. The minimum is either in R x or 
in i?2- We consider the following three cases, corresponding 
to the three quantizers: 

Case 1 The minimum is in R 2 . Then by (strict) convexity 
of partial mutual information, property (ii): 



tl3 < i 123- 

Case 2 The minimum is in R\, and 
refer to Fig. 8. Then we can show: 

ti3 < l ii 



61+63 

C1+C3 



> 



62. 

'■2 ' 



(55) 
please 

(56) 



as follows. Since the minimum in i?i U R 2 is in R\, then the 
minimum in R 2 alone is at (713. The line coincident with Ro 
passes through the origin, and the origin is the minimum along 
this line, by property (iii). So, by similar convexity arguments, 
the minimum along Rq is also at (713. In other words, all the 
points on R 2 \ qis and on Ro \ (713 have greater partial mutual 
information than (713. 

The two rays R and R 2 define a cone D with vertex <7 13 . 
Observe that if i(qi3) < t(zi), for all Zi £ Ri with i = 0, 2, 
then i(<7i3) < l(x) for all x £ D. That is, the partial mutual 
information for any point inside D is greater than that for the 
vertex (713. This claim can be verified as follows: Consider a 
ray i? 3 that is defined as 



R 3 = {(c, b)\c - ci + c 3 , b > 61 + 63}. 



(57) 
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Fig. 9: Partial mutual information t(c, b) is shown using level curves. Case of - x , c 3 < — 



Then the cone D is divided into two cones with the vertex 
gi 3 ; the cone D defined by the rays R and i? 3 and the cone 
D 2 defined by the rays R 2 and i? 3 . For any point q' £ Dq, the 
line, denoted by lo(q'), passes through q' and (713 crosses the 
line I = |y. This means that all points in the intersection of 
lo(q') and Dq has larger partial mutual information than that 
for q 13 , and thus, the partial mutual information for any point 
in Dq \ {(713} is greater than that for the vertex q 13 . Now, we 
turn to considering D 2 . For any point q' E D 2 , the vertical 
line, denoted by l 2 (q'), passing through q' also crosses the line 
\ = %■ Let z 2 denote the intersection of l 2 (q') and Rq. Then 
from the same reasoning, the partial mutual information for 
the point q' <G D 2 is greater than that for z 2 , which is greater 
than that for (713, since z 2 is on Rq. 

All that remains is to show that q\ is inside the cone D. We 
use a geometrical argument using Fig. 8, for fixed values of 
C and B. The points qi for i G {1, . . . , 4}, (713, and 5123 are 
plotted. In addition, the cone formed by R and R 2 is shown. 
Consider a ray R' defined as 



Case 3 The minimum is in Ri, and 



R' 



{qi+0(qi3-qi)\0>0}. 



(58) 



which stems from qi and passes through (713. Geometrically, 
it is clear that if the slope of R' is less than the slope of R 2 , 
then (71 is inside this cone. The slope of R' is — and the slope 
of R 2 is — . This holds by assumption (37). Thus, q\ is inside 
the cone and (43 < l\. 



bi+63 

C1+C3 



62, 

Co ' 



^; please refer 

B 
C 

— II and the 



to Fig. 9. In this case, (713 is always below the line 
since otherwise the ray R 2 crosses the line - 
point having the minimum partial mutual information is always 
in R 2 . Using the same reasoning as in Case 2, it can be shown 
that (73, rather than qi, is inside the cone defined by Rq and 
R 2 (see Fig. 9). Further, the point (734 = (73 + q± is also inside 
this cone since the slope of (74 is less than than the slope of 
R 2 . Property (i) indicates l 12 = t(c 3 + c 4 ,6 3 + 64), and we 
can show 



£13 < l 12- 



(59) 



Since at least one of Case 1, Case 2 or Case 3 always holds, 
then (49) always holds for I' = 4. 

Finally, the above argument is extended to arbitrary /'. The 
partial mutual information of the odd-even assignment, which 
has H' = I', is denoted by to- The generalization of the three 
cases are: 

• Quantizer 1: A\ = {1,2,3,5,7,...} and A 2 = 
{4,6,8,...}, has H" = H' - 2, with partial mutual 
information la'- 



ieAi ieAi 



(60) 



Quantizer 2: Ai = {1} and A 2 = {2,3,4,5,...}, has 



H" = 2, with partial mutual information ib : 

i B = t(ci,6i) (61) 

Quantizer 3: A\ = {1,2} and _4 2 = {3,4,5,...}, has 
H" = 2, with partial mutual information lq ■ 



l c = i(ci +c 2 ,6i +6 2 ). 



(62) 



The generalization can be seen if the odd-even assign- 
ment is written as A\ = {(1), (3, 5, 7, . . .)} and *4 2 = 
{(2), (4, 6, 8, . . .)}. Then (3, 5, 7, . . .) and (4, 6, 8, . . .) corre- 
spond to channel outputs (3) and (4), respectively, from the 
I' = 4 odd-even assignment. 

Following the same arguments as in the case /' = 4, we 
can show: 



t < 



((-A, IB, 



'C 



(63) 



Clearly the number of output groups H" is lower in each case. 
This completes the proof of Lemma 4. 

It should be noted that the two-dimensional Figs. 8 and 9 do 
not correspond directly to the concave optimization problem. 
With K = 2 and I' = 4, the concave optimization problem of 
Lemma 4 is over K-V = 8 dimensions. The polytope feasible 
region has 2 4 vertices and thus 2 4 candidate solutions. The 
various points q\, g 2 , etc. in the figures correspond to some of 
these solutions. In this way, a high-dimensional optimization 
problem was reduced to a two-dimensional problem. 

C. Lemma 5 

The following Lemma 5 states that consecutive assignments 
have higher partial mutual information than non-consecutive 
assignments, for any two quantizer outputs. 

Lemma 5. Assume that (37) is satisfied, and let the assign- 
ment, A m and A n , have partial mutual information i m . n . If 
the assignment, A m and A n , is non-consecutive, then there 
exists a consecutive assignment, A' m and A' n , with partial 
mutual information i' m n , such that A m U A n = A' m U A' n 

and l mn J> ^m.n- 

Proof. While Lemma 5 holds for arbitrary /'-to-2 assign- 
ments, it will first be shown that it is sufficient to consider odd- 
even assignments. Then, Lemma 4 will be applied recursively 
to odd-even assignments. 

For an assignment which is not odd-even, we can form 
a new channel which combines each output group into a 
single output. This results in a new channel with an odd-even 
assignment which has the same partial mutual information 
as the original channel. In what follows, Lemma 4 can be 
regarded as being applied to the new channel, where each 
channel output represents a group of the original channel. 
Thus, it is sufficient to consider odd-even assignments. 

Lemma 4 can be applied recursively until the number of 
output groups is two. If the initial assignment has H' output 
groups and mutual information i! , then by Lemma 4, there 
exists another assignment with H" < H' and t" > i! . If this 
is not an odd-even assignment, then each output group can be 
combined to form a single channel output of a new channel, 
resulting in an odd-even assignment. Continuing recursively, 



the assignments can continue until the number of output 
groups is two, corresponding to a consecutive assignment. 
Lemma 4 guarantees that, at each step of the recursion, the 
partial mutual information increases. This completes the proof 
of Lemma 5. 

D. Proof of Lemma 3 

To prove Lemma 3, it is sufficient to show that for any quan- 
tizer which contains at least one non-consecutive assignment, 
there exists another consecutive quantizer, which has greater 
mutual information. Define "pairwise consecutive" as any two 
output groups which satisfy the definition of consecutive given 
earlier. 

Let Q,Q' , Q" , ... be a sequence of quantizers that respec- 
tively produce total mutual information i,i',i",.... Assume 
that Q has at least one pair of output groups which are not 
pairwise consecutive. Select one such pair. By Lemma 5, there 
exists a new quantizer Q', obtained by relabeling only that pair, 
which has strictly greater mutual information, i' > i. Note that 
Q' may still have pairs of output groups which are not pair- 
wise consecutive. Lemma 5 can be applied recursively, obtain- 
ing a sequence of quantizers Q,Q' , Q" , . . . , Q end which have 
total mutual information satisfying i < i' < i" < ■ ■ ■ < i end . 
The recursion terminates with the quantizer Q end , which is the 
first quantizer which has all assignments pair-wise consecutive. 
Note that the recursion is guaranteed to terminate because 
there are a finite number of possible quantizers. Furthermore, 
since Lemma 5 provides a strict inequality, the new mutual 
information is higher than all previous values, and thus the 
new quantizer is distinct. That is, a quantizer can appear 
in the sequence only one time. Thus, a quantizer with non- 
consecutive labeling cannot maximize mutual information at 
the quantizer output. This completes the proof of Lemma 3. 

E. Proof of Theorem 's Optimality Part 

In this section, the optimality part of the Theorem is proved. 
Lemma 3 is a necessary condition on quantizer optimality. 
In particular, the optimal quantizer satisfies the condition 
that a\ < a 2 < • • • < o,k-i, where a, denote the output 
group boundaries. This is the exact condition over which 
the Quantization Algorithm searches. Thus, it is sufficient to 
show that among these quantizers, the Quantization Algorithm, 
which is the realization of a dynamic program, will output the 
quantizer which maximizes mutual information. 

In the language of dynamic programming, a problem ex- 
hibits optimal substructure if the optimal solution contains 
optimal solutions to subproblems. If this condition holds, 
then dynamic programming provides the optimal solution, and 
moreover, the optimal substructure should be exploited in the 
optimization [29, Sec. 15.3]. 

For the Theorem, the subproblem consists of finding the 
quantizer which maximizes partial mutual information for 
some partial quantization of the outputs. In detail, recall Sk(a) 
is the maximum of partial mutual information when channel 
outputs 1 to a are quantized to quantizer outputs 1 to k, 



S k (a) 



(Sfc-iM + Ka' "►<»)), 



(64) 
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where the maximization is over a' E {k — 1, . . . , o — 1}. 

For fixed k and a, assume that Sk(a) is the maximum 
of partial mutual information, corresponding to the optimal 
quantization of channel outputs 1 to a to the quantizer output 
groups 1 to k. Let a be the last element of group k — 1, that 
is: 



Sk(a) 



5 fe _i(a) + i(a ->• a), 



(65) 



so that group k consists of a + 1, . . . , a, that is, Ak = {a + 
1, . . . , a}. Then, the quantizer for channel outputs 1 to a must 
also be optimal. This is true because if another quantizer of 1 
to a produced higher mutual information, then the quantization 
of 1 to a would also have higher partial mutual information, 
leading to a contradiction of the assumption that 1 to a was 
optimally quantized. 

The above argument is sufficient to prove that the quantiza- 
tion problem has optimal substructure, and since the Quan- 
tization Algorithm exploits this structure, the algorithm is 
optimal. Along with the earlier statement that the complexity 
is proportional to I 2 , the proof of the Theorem is completed. 
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